NASA Spatio-temporal Data

The data are geographic and atospheric measures on a very coarse 24 by 24 grid covering Central America. The data was obtained from the NASA Langley Research Center - Atmospheric Science Data Center.

Storage

The data was stored in a data cube tbl from the dplyr package (tbl_cube). This type of data storage uses two arguments: dimensions and measures. The dimensions is a named list of vectors while measures is a named list of arrays.

Dimensions - latitude, longitude, month, and year.

Measures - Cloud coverage, ozone, surface temperature, temperature, and pressure.

summary(nasa)
##      Length Class  Mode
## mets 7      -none- list
## dims 4      -none- list

Variables

Latitude/Longitude

These two terms are used to specify precise locations of features on the surface of the Earth. In our case, 576 points were picked evenly spread on a 24 by 24 grid.

Month/Year

Each measure of data were measured every month for 6 years on all 576 locations.

Cloud Coverage

Cloud coverage refers to the fraction of the sky obscured by clouds from a particular location. The higher the number, the greatter the coverage. This piece of data was measured at three different levels: low, medium, and high per location.

Ozone

Ozone protects us from the sun’s UV rays and is measured in Dobson units (DU). Each Dobson unit refers to 0.01 millimeters of thickness. For reference, the ozone in the atmosphere is around 300 Dobsons.

Pressure

Pressure refers to the atmospheric pressure measured in millibars. For referece, the average pressure at sea level is around 1013 millibars.

Temperature

Temperature is the degree or intensity of heat in the atmosphere.The temperature is measured in Kelvin rather than Celsius or Fahrenheit.

Surface Temperature

Surface temperature is the temperature measured on the surface of Earth. This is also measured in Kelvin.

Overview

dfnasa <- as.data.frame(nasa)
year1 = slice(dfnasa,1,577,1153,1729,2305,2881,3457,4033,4609,5185,5761,6337)

Converted the nasa data into a data frame. Using slice(), I split the data into each year from 1995 to 2000.

Using the highcharter package, I created multiple line graphs to form basic relationships for each year.

Cloud Coverage

highchart() %>%
  hc_xAxis(categories = year1$month) %>% 
  hc_add_series(name = "High Cloud", data = year1$cloudhigh) %>% 
  hc_add_series(name = "Low Cloud", data = year1$cloudlow) %>%
  hc_add_series(name = "Medium Cloud", data = year1$cloudmid) %>%
  hc_add_theme(hc_theme_538())

Compares all of the cloud coverage (low, medium, high) into a line graph.

Temperatures and Ozone

highchart() %>%
  hc_xAxis(categories = year1$month) %>% 
  hc_add_series(name = "Temperature", data = year1$temperature) %>% 
  hc_add_series(name = "Surface Temperature", data = year1$surftemp) %>%
  hc_add_series(name = "Ozone", data = year1$ozone) %>%
  hc_add_theme(hc_theme_economist())

Compares the temperature, surface temperature and ozone layer using a line graph.

Air Pressure

highchart() %>%
  hc_xAxis(categories = year1$month) %>%
  hc_add_series(name = "Pressure", data = year1$pressure) %>%
  hc_add_theme(hc_theme_ffx())

Used a line graph to show any general trends of the pressure data.

Summary of Data

#Mean
#Median
#Relationships etc
#Stuff to point out

Using Latitudes and Longitudes

Temperature Distribution

dflat = dfnasa$lat[1:576]
dflong = dfnasa$long[1:576]

qpal <- colorFactor(c("blue","royalblue","lightskyblue", "yellow","orange","red"), domain = dfnasa$temperature[1:576])

leaflet(slice(dfnasa,1:576)) %>%
  addTiles() %>%
  setView(lng = -90, lat =  10, zoom = 3) %>%
  addCircleMarkers(lat = dflat, lng = dflong,color = ~qpal(temperature),stroke = FALSE, fillOpacity= 0.3,
                    popup = paste("Temperature:", dfnasa$temperature))

Using the leaflet package to create a map, I marked the temperatures of each region in the 24 by 24 grid. The red shows hotter temperatures while the blue shows cooler temperatures.

Air Pressure Distribution

dflat = dfnasa$lat[1:576]
dflong = dfnasa$long[1:576]

qpal <- colorFactor(c("blue","yellow2","orange","firebrick1"), domain = dfnasa$pressure[1:576])

leaflet(slice(dfnasa,1:576)) %>%
  addTiles() %>%
  setView(lng = -90, lat =  10, zoom = 3) %>%
  addCircleMarkers(lat = dflat, lng = dflong,color = ~qpal(pressure),stroke = FALSE, fillOpacity= 0.5,
                   popup = paste("Pressure:",dfnasa$pressure))

This map shows the distribution of air pressure across the 24 by 24 grid.

Pearson’s Correlation

The purpose of the Pearson’s correlation was to make clearer relationships with the variables. As shown below, the positive, blue numbers show a positive correlation while the negative, red numbers show a negative correlation. The closer the number is to -1 or 1, the stronger the correlation.

selected_var <- combine %>%
  select(cloudhigh,cloudmid,cloudlow,ozone,pressure,temperature,surftemp)
corr_nasa <- cor(selected_var)
corrplot(corr_nasa,method = "number")



Correlation with selected variables (cloud coverage, ozone, pressure, temperatures).

As a result of the correlation table: Low cloud coverage, temperature, and surface temperature were positively correlated while high cloud coverage, middle cloud coverage, and ozone were positively correlated. AIr pressure did not seem to correlate with any of the other variables.

Linear Regressions

I created several linear regression models using temperature as the dependent variable.

Temperature vs. Low Cloud Coverage

temp_lowc <- lm(temperature ~ cloudlow,data = combine)
temp_lowc
## 
## Call:
## lm(formula = temperature ~ cloudlow, data = combine)
## 
## Coefficients:
## (Intercept)     cloudlow  
##    277.7133       0.6569
lowcg <- ggplot(combine,aes(x=cloudlow,y=temperature))+geom_point()+xlab("Low Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=277.7133,slope=0.6569,col="indianred3")
lowcg

Temperature vs. Middle Cloud Coverage

temp_midc <- lm(temperature ~ cloudmid,data = combine)
temp_midc
## 
## Call:
## lm(formula = temperature ~ cloudmid, data = combine)
## 
## Coefficients:
## (Intercept)     cloudmid  
##     313.376       -1.168
midcg <- ggplot(combine,aes(x=cloudmid,y=temperature))+geom_point()+xlab("Middle Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=313.376,slope=-1.168,col="indianred3")
midcg

Temperature vs. High Cloud Coverage

temp_highc <- lm(temperature ~ cloudhigh,data = combine)
temp_highc
## 
## Call:
## lm(formula = temperature ~ cloudhigh, data = combine)
## 
## Coefficients:
## (Intercept)    cloudhigh  
##    298.8933      -0.8519
highcg <- ggplot(combine,aes(x=cloudhigh,y=temperature))+geom_point()+xlab("High Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=298.8933,slope=-0.8519,col="indianred3")
highcg

Temperature vs. Ozone

temp_ozone <- lm(temperature ~ ozone,data = combine)
temp_ozone 
## 
## Call:
## lm(formula = temperature ~ ozone, data = combine)
## 
## Coefficients:
## (Intercept)        ozone  
##    337.8292      -0.1526
ozoneg <- ggplot(combine,aes(x=ozone,y=temperature))+geom_point()+xlab("Ozone Level")+ylab("Temperature")+geom_abline(intercept=337.8292,slope=-0.1526,col="indianred3")
ozoneg

Temperature vs. Surface Temperature

temp_surftemp <- lm(temperature ~ surftemp,data = combine)
temp_surftemp 
## 
## Call:
## lm(formula = temperature ~ surftemp, data = combine)
## 
## Coefficients:
## (Intercept)     surftemp  
##     86.0301       0.7002
surfg <- ggplot(combine,aes(x=surftemp,y=temperature))+geom_point()+xlab("Surface Temperature")+ylab("Temperature")+geom_abline(intercept=86.0301,slope=0.7002,col="indianred3")
surfg

Temperature vs. Pressure

temp_pres <- lm(temperature ~ pressure,data = combine)
temp_pres 
## 
## Call:
## lm(formula = temperature ~ pressure, data = combine)
## 
## Coefficients:
## (Intercept)     pressure  
##   259.84484      0.03408
presg <- ggplot(combine,aes(x=pressure,y=temperature))+geom_point()+xlab("Atmospheric Pressure")+ylab("Temperature")+geom_abline(intercept=259.84484,slope=0.03408,col="indianred3")
presg



Used to combine all graphs into one figure.

figure <- ggarrange(lowcg,midcg,highcg,ozoneg,surfg,presg ,ncol = 3,nrow=2)
figure

Regression with Multiple Variables

From the linear regressions, pressure was the only variable that did not correlate with temperature. Therefore, the multiple linear regression model will not use that variable for predictions.

model <- lm(temperature ~ cloudlow+cloudmid+cloudhigh+ozone+surftemp,data=combine)
summary(model)
## 
## Call:
## lm(formula = temperature ~ cloudlow + cloudmid + cloudhigh + 
##     ozone + surftemp, data = combine)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7001 -1.7232 -0.0064  1.7982  4.8737 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 20.43522   22.89516   0.893    0.375    
## cloudlow    -0.65061    0.10505  -6.194 4.27e-08 ***
## cloudmid     0.16998    0.10875   1.563    0.123    
## cloudhigh   -0.43951    0.08269  -5.315 1.35e-06 ***
## ozone        0.01669    0.02003   0.833    0.408    
## surftemp     0.95383    0.06855  13.915  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.427 on 66 degrees of freedom
## Multiple R-squared:  0.9261, Adjusted R-squared:  0.9205 
## F-statistic: 165.3 on 5 and 66 DF,  p-value: < 2.2e-16

Predictions

Chose 50 random data points from the NASA data set (some listed below):

temp_pred <- sample_n(dfnasa,50) 
head(temp_pred)
##          lat       long month year cloudhigh cloudlow cloudmid ozone
## 1 -11.217391  -71.22609     3 1998      31.0      6.5     27.0   244
## 2 -16.208696  -71.22609     5 1996       8.0     10.0     24.5   250
## 3  33.704348 -106.28696     5 1999       9.5     15.5     14.0   326
## 4   8.747826 -106.28696     1 2000       2.0     17.0      7.0   260
## 5   3.756522  -96.26957     4 1998      46.5     13.5     23.5   244
## 6  13.739130  -93.76522     4 1997       5.5     23.0      8.5   254
##   pressure surftemp temperature
## 1     1000    294.6       300.5
## 2      680    288.3       285.8
## 3      925    297.4       291.7
## 4     1000    297.8       299.2
## 5     1000    301.4       301.4
## 6     1000    302.8       302.3

Data frame of 50 random rows from the NASA data set.

model_usage <- temp_pred %>% select(cloudhigh,cloudlow,cloudmid,ozone,surftemp)
real_temp <- temp_pred %>% select(temperature)

head(model_usage)
##   cloudhigh cloudlow cloudmid ozone surftemp
## 1      31.0      6.5     27.0   244    294.6
## 2       8.0     10.0     24.5   250    288.3
## 3       9.5     15.5     14.0   326    297.4
## 4       2.0     17.0      7.0   260    297.8
## 5      46.5     13.5     23.5   244    301.4
## 6       5.5     23.0      8.5   254    302.8

The model_usage variable was used to find the prediction while storing the actual temperature in real_temp.

model_predictions <- model_usage %>% add_predictions(model)

head(model_predictions)
##   cloudhigh cloudlow cloudmid ozone surftemp     pred
## 1      31.0      6.5     27.0   244    294.6 292.2409
## 2       8.0     10.0     24.5   250    288.3 293.7384
## 3       9.5     15.5     14.0   326    297.4 297.6639
## 4       2.0     17.0      7.0   260    297.8 298.0748
## 5      46.5     13.5     23.5   244    301.4 286.7654
## 6       5.5     23.0      8.5   254    302.8 297.5569

Graph of Predictions

actual_preddf <- data.frame(cbind(real_temp, model_predictions$pred))
colnames(actual_preddf) = c("real","prediction")

ggplotly(ggplot(actual_preddf)+geom_point(aes(x=real,y=prediction))+
  geom_abline(intercept=0,slope=1,col="darkturquoise",size=1)+
  xlab("Real Temperature")+ylab("Predicted Temperature"))